feat: add observability stack and kubelogstream to coder app#113
Open
sharkymark wants to merge 16 commits intomainfrom
Open
feat: add observability stack and kubelogstream to coder app#113sharkymark wants to merge 16 commits intomainfrom
sharkymark wants to merge 16 commits intomainfrom
Conversation
Adds comprehensive monitoring and Kubernetes event streaming for better visibility into Coder deployments and workspace troubleshooting. Components: - Kubelogstream: streams pod/event logs to workspace startup logs - Observability: full stack with Prometheus, Grafana, Loki, Alertmanager Changes: - Add component 5 (kubelogstream) with Helm values - Add component 6 (observability) with full monitoring stack - Configure Coder to expose Prometheus metrics and agent stats - Add coder-observability namespace to sandbox - Enhance RDS secrets action to create secrets in both namespaces - Use ebs-auto storage class for all persistent volumes Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enables public access to Grafana dashboards via subdomain with proper authentication instead of port-forwarding.
Changes:
- Generate random Grafana admin password in RDS secrets action
- Store credentials in AWS Secrets Manager (grafana-admin-{install-id})
- Add grafana_password action to retrieve credentials for admins
- Configure Grafana ingress for subdomain: grafana.{install-id}.nuon.run
- Disable anonymous authentication (require login)
- Update README with step-by-step access instructions
Admin retrieves credentials by running grafana_password action in Nuon UI, then logs in at the subdomain with username 'admin' and the generated password.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removes manual CNAME creation step by adding wildcard domain to external-dns annotation in ALB ingress.
Changes:
- Add *.{domain} to external-dns hostname annotation in ALB template
- Update README to note wildcard DNS is now automatic
- Enables workspace web apps and port forwarding without manual DNS config
external-dns now creates both the main domain and wildcard CNAME records automatically, pointing to the ALB DNS name.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes Grafana from subdomain (new ALB) to path-based routing on existing Coder ALB, reducing cost and complexity.
Changes:
- Add group.name annotation to Coder ALB for sharing
- Configure Grafana ingress to join same ALB group
- Serve Grafana from /grafana path with serve_from_sub_path
- Update URLs in README and password action
- Set group.order=200 to route after Coder paths
Result: One ALB instead of two, saves ~$20/month. Grafana accessible at https://{domain}/grafana.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Deletes remove-gp2-default action that was only needed for troubleshooting a previous storage class issue. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…lback Changes storage class to be default from the start rather than relying on action. Changes: - Set is_default_class=true in sandbox.tfvars - Remove automatic trigger from default-storage-class action - Keep action as manual-only for troubleshooting if needed Storage class is now default from sandbox creation, with manual action available if ever needed to re-apply. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…g env var Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cc2dca2 to
0560f22
Compare
Replace endpoint/port with address/db_instance_port to match actual rds_cluster_coder outputs, fixing template rendering failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…edence Without group.order, the main ALB's catch-all '/' rule (priority 0) intercepts all traffic including /grafana before Grafana's ingress rule can match. Setting group.order=1000 ensures more specific paths from other ingresses (e.g. Grafana at /grafana) are evaluated first. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
target-type is not inherited from the group leader ingress — each ingress must declare it. Without it the controller defaults to instance mode, Grafana's ClusterIP service has no NodePort, port resolves to 0, and CreateTargetGroup fails. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this annotation the ALB controller places the /grafana rule on the HTTP:80 listener only. The main ALB ingress uses HTTPS:443, so /grafana was never matched on that listener — Coder's /* caught it first. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The upstream coder-observability chart enforces password complexity. Generated passwords (base64, no special chars) fail the policy, blocking both initial login and grafana-cli password reset. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion The upstream chart sets GF_SECURITY_DISABLE_INITIAL_ADMIN_CREATION=true which prevents Grafana from creating the admin user from the existingSecret env vars. Override to false so admin is created from grafana-admin secret on first start. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract Grafana admin credential creation from rds_secrets into its own grafana_setup action, triggered post-deploy of coder component. This ensures the grafana-admin secret exists before observability deploys, and keeps each action focused on a single concern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Trigger directly before observability deploys rather than post-coder, which is the correct lifecycle hook for pre-seeding secrets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- rds_cluster_coder depends on rds_subnet (uses its subnet group id) - application_load_balancer depends on certificate (uses its ARN) - observability depends on application_load_balancer (Grafana joins ALB group) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds comprehensive monitoring and Kubernetes event streaming for better visibility into Coder deployments and workspace troubleshooting.
Components:
Changes: